Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT Store data in bytes not io.BytesIO #91

Merged
merged 8 commits into from
Dec 28, 2023

Conversation

ryanking13
Copy link
Member

This is a split off of #90.

This PR changes how wheel data is stored in WheelInfo class during the package installation: instead of converting the downloaded wheel data to io.BytesIO and storing it, store it as bytes and convert it to io.BytesIO when we need to pass it to other methods that accepts io.BytesIO object.

No functional change is intended, just to avoid the mistake of having to do a seek(0) after reading the data.

There seems a slight overhead of converting bytes to io.BytesIO, but I think it is not critical.

Benchmark
import io
import timeit

num_iterations = 10000
bytes_size = 100_000
bytes_obj = b'0' * bytes_size

def bench_bytes():
    for i in range(num_iterations):
        io_obj = io.BytesIO(bytes_obj)
        io_obj.read()

def bench_bytesio():
    io_obj = io.BytesIO(bytes_obj)
    for i in range(num_iterations):
        io_obj.seek(0)
        io_obj.read()

print('Benchmarking using a single BytesIO object and seek()...')
print(timeit.timeit(bench_bytesio, number=1))
print('Benchmarking converting bytes to io.BytesIO multiple times...')
print(timeit.timeit(bench_bytes, number=1))
Benchmarking using a single BytesIO object and seek()...
0.0006137999998827581
Benchmarking converting bytes to io.BytesIO multiple times...
0.001084399999854213

Copy link
Member

@hoodmane hoodmane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ryanking13. This is a nice improvement. I'd always much rather be handling a bytes object than a BytesIO

As long as the argument is a bytes object, creating a BytesIO is effectively free, it just stores the bytes object:
https://github.com/python/cpython/blob/v3.11.6/Modules/_io/bytesio.c#L945-L949
So using them only once rather than having to seek(0) all the time is probably a better pattern.

Comment on lines 22 to 30
async def fetch_bytes(url: str, kwargs: dict[str, str]) -> bytes:
parsed_url = urlparse(url)
if parsed_url.scheme == "emfs":
return open(parsed_url.path, "rb")
if parsed_url.scheme == "file":
result_bytes = Path(parsed_url.path).read_bytes()
elif parsed_url.scheme == "file":
result_bytes = (await loadBinaryFile(parsed_url.path)).to_bytes()
else:
result_bytes = await (await pyfetch(url, **kwargs)).bytes()
return BytesIO(result_bytes)
return result_bytes
Copy link
Member

@hoodmane hoodmane Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't do anything with result_bytes anymore, maybe tidy as follows:

async def fetch_bytes(url: str, kwargs: dict[str, str]) -> bytes:
    parsed_url = urlparse(url)
    if parsed_url.scheme == "emfs":
        return Path(parsed_url.path).read_bytes()
    if parsed_url.scheme == "file":
        return (await loadBinaryFile(parsed_url.path)).to_bytes()
    return await (await pyfetch(url, **kwargs)).bytes()

Comment on lines 23 to 24
response = _fetch(url, kwargs=kwargs)
return BytesIO(response.read())
return response.read()
Copy link
Member

@hoodmane hoodmane Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

    return _fetch(url, kwargs=kwargs).read()

@rth
Copy link
Member

rth commented Dec 25, 2023

@ryanking13 Thanks! Let us know if you plan to address Hood's suggestions, otherwise +1 to merge even as is.

@ryanking13
Copy link
Member Author

Thanks for the review! I had forgotten about this PR.

@ryanking13 ryanking13 merged commit 48ccefc into pyodide:main Dec 28, 2023
8 checks passed
@ryanking13 ryanking13 deleted the bytes-io branch December 28, 2023 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants